Interpretable clustering using unsupervised binary trees

نویسندگان

  • Ricardo Fraiman
  • Badih Ghattas
  • Marcela Svarc
چکیده

We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets. The Matlab code for the three stages of the algorithm is provided on the Supplemental Materials.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering using Unsupervised Binary Trees: CUBT

We introduce a new clustering method based on unsupervised binary trees. It is a three stages procedure, which performs on a first stage recursive binary splits reducing the heterogeneity of the data within the new subsamples. On the second stage (pruning) adjacent nodes are considered to be aggregated. Finally, on the third stage (joining) similar clusters are joined even if they do not descen...

متن کامل

Concept Acquisition for Dialog Agents

Dialog agents capable of autonomously acquiring new concepts are likely to be more powerful than those relying on a fixed set of preprogrammed concepts. kx-trees provide a novel unsupervised learning method for concept acquisition. Through online, incremental, divisive, binary-tree-based clustering, it organizes raw sensory experiences into low-level concepts. Using the same mechanism, it can o...

متن کامل

Learning shape categories by clustering shock trees

This paper investigates whether meaningful shape categories can be identified in an unsupervised way by clustering shocktrees. We commence by computing weighted and unweighted edit distances between shock-trees extracted from the HamiltonJacobi skeleton of 2D binary shapes. Next we use an EMlike algorithm to locate pairwise clusters in the pattern of edit-distances. We show that when the tree e...

متن کامل

Indexing Images by Trees of Visual Content

Haim Schweitzer ([email protected]) The University of Texas at Dallas P.O Box 830688, Richardson, Texas 75083 Abstract An unsupervised algorithm for arranging an image database as a binary tree is described. Tree nodes are associated with image subsets, maintaining the property that the similarity among the images associated with the children of a node is higher than the similarity among the im...

متن کامل

Interpretable Multiclass Models for Corporate Credit Rating Capable of Expressing Doubt

Corporate credit rating is a process to classify commercial enterprises based on their creditworthiness. Machine learning algorithms can construct classification models, but in general they do not tend to be 100% accurate. Since they can be used as decision support for experts, interpretable models are desirable. Unfortunately, interpretable models are provided by only few machine learners. Fur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Adv. Data Analysis and Classification

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2013